147 research outputs found

    Chinese Named Entity Recognition Method for Domain-Specific Text

    Get PDF
    The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods

    A Named Entity Recognition Method Enhanced with Lexicon Information and Text Local Feature

    Get PDF
    At present, Named Entity Recognition (NER) is one of the fundamental tasks for extracting knowledge from traditional Chinese medicine (TCM) texts. The variability of the length of TCM entities and the characteristics of the language of TCM texts lead to ambiguity of TCM entity boundaries. In addition, better extracting and exploiting local features of text can improve the accuracy of named entity recognition. In this paper, we proposed a TCM NER model with lexicon information and text local feature enhancement of text. In this model, a lexicon is introduced to encode the characters in the text to obtain the context-sensitive global semantic representation of the text. The convolutional neural network (CNN) and gate joined collaborative attention network are used to form a text local feature extraction module to capture the important semantic features of local text. Experiments were conducted on two TCM domain datasets and the F1 values are 91.13% and 90.21% respectively

    Prompt-based Node Feature Extractor for Few-shot Learning on Text-Attributed Graphs

    Full text link
    Text-attributed Graphs (TAGs) are commonly found in the real world, such as social networks and citation networks, and consist of nodes represented by textual descriptions. Currently, mainstream machine learning methods on TAGs involve a two-stage modeling approach: (1) unsupervised node feature extraction with pre-trained language models (PLMs); and (2) supervised learning using Graph Neural Networks (GNNs). However, we observe that these representations, which have undergone large-scale pre-training, do not significantly improve performance with a limited amount of training samples. The main issue is that existing methods have not effectively integrated information from the graph and downstream tasks simultaneously. In this paper, we propose a novel framework called G-Prompt, which combines a graph adapter and task-specific prompts to extract node features. First, G-Prompt introduces a learnable GNN layer (\emph{i.e.,} adaptor) at the end of PLMs, which is fine-tuned to better capture the masked tokens considering graph neighborhood information. After the adapter is trained, G-Prompt incorporates task-specific prompts to obtain \emph{interpretable} node representations for the downstream task. Our experiment results demonstrate that our proposed method outperforms current state-of-the-art (SOTA) methods on few-shot node classification. More importantly, in zero-shot settings, the G-Prompt embeddings can not only provide better task interpretability than vanilla PLMs but also achieve comparable performance with fully-supervised baselines.Comment: Under revie

    Location optimization of fresh food e-commerce front warehouse

    Get PDF
    The ongoing emergence of COVID-19 and the maturation of cold chain technology, have aided in the rapid development of the fresh produce e-commerce industry. Taking into account the characteristics of consumers' demand for fresh products, this paper constructs a location allocation model of a front warehouse for fresh e-commerce with the objective of minimizing the total cost. An improved immune optimization algorithm is proposed in this paper, and the effectiveness of the proposed algorithm is demonstrated by a real case study. The results show that the improved immune optimization algorithm outperforms the traditional genetic algorithm in terms of solution accuracy; the proposed location model can effectively help fresh produce e-commerce enterprises open new front-end warehouses when demand is increasing, as well as provide optimal economic decision-making for front warehouse layout

    Multiscale modeling for the heterogeneous strength of biodegradable polyesters

    Get PDF
    A heterogeneous method of coupled multiscale strength model is presented in this paper for calculating the strength of medical polyesters such as polylactide (PLA), polyglycolide (PGA) and their copolymers during degradation by bulk erosion. The macroscopic device is discretized into an array of mesoscopic cells. A polymer chain is assumed to stay in one cell. With the polymer chain scission, it is found that the molecular weight, chain recrystallization induced by polymer chain scissions, and the cavities formation due to polymer cell collapse play different roles in the composition of mechanical strength of the polymer. Therefore, three types of strength phases were proposed to display the heterogeneous strength structures and to represent different strength contribution to polymers, which are amorphous phase, crystallinity phase and strength vacancy phase, respectively. The strength of the amorphous phase is related to the molecular weight; strength of the crystallinity phase is related to molecular weight and degree of crystallization; and the strength vacancy phase has negligible strength. The vacancy strength phase includes not only the cells with cavity status but also those with an amorphous status, but a molecular weight value below a threshold molecular weight. This heterogeneous strength model is coupled with micro chain scission, chain recrystallization and a macro oligomer diffusion equation to form a multiscale strength model which can simulate the strength phase evolution, cells status evolution, molecular weight, degree of crystallinity, weight loss and device strength during degradation. Different example cases are used to verify this model. The results demonstrate a good fit to experimental data

    Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines

    Get PDF
    Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets

    Discovery of a Novel Nav1.7 Inhibitor From Cyriopagopus albostriatus Venom With Potent Analgesic Efficacy

    Get PDF
    Spider venoms contain a vast array of bioactive peptides targeting ion channels. A large number of peptides have high potency and selectivity toward sodium channels. Nav1.7 contributes to action potential generation and propagation and participates in pain signaling pathway. In this study, we describe the identification of μ-TRTX-Ca2a (Ca2a), a novel 35-residue peptide from the venom of Vietnam spider Cyriopagopus albostriatus (C. albostriatus) that potently inhibits Nav1.7 (IC50 = 98.1 ± 3.3 nM) with high selectivity against skeletal muscle isoform Nav1.4 (IC50 > 10 μM) and cardiac muscle isoform Nav1.5 (IC50 > 10 μM). Ca2a did not significantly alter the voltage-dependent activation or fast inactivation of Nav1.7, but it hyperpolarized the slow inactivation. Site-directed mutagenesis analysis indicated that Ca2a bound with Nav1.7 at the extracellular S3–S4 linker of domain II. Meanwhile, Ca2a dose-dependently attenuated pain behaviors in rodent models of formalin-induced paw licking, hot plate test, and acetic acid-induced writhing. This study indicates that Ca2a is a potential lead molecule for drug development of novel analgesics

    Deep learning for the rapid automatic segmentation of forearm muscle boundaries from ultrasound datasets

    Get PDF
    Ultrasound (US) is widely used in the clinical diagnosis and treatment of musculoskeletal diseases. However, the low efficiency and non-uniformity of artificial recognition hinder the application and popularization of US for this purpose. Herein, we developed an automatic muscle boundary segmentation tool for US image recognition and tested its accuracy and clinical applicability. Our dataset was constructed from a total of 465 US images of the flexor digitorum superficialis (FDS) from 19 participants (10 men and 9 women, age 27.4 ± 6.3 years). We used the U-net model for US image segmentation. The U-net output often includes several disconnected regions. Anatomically, the target muscle usually only has one connected region. Based on this principle, we designed an algorithm written in C++ to eliminate redundantly connected regions of outputs. The muscle boundary images generated by the tool were compared with those obtained by professionals and junior physicians to analyze their accuracy and clinical applicability. The dataset was divided into five groups for experimentation, and the average Dice coefficient, recall, and accuracy, as well as the intersection over union (IoU) of the prediction set in each group were all about 90%. Furthermore, we propose a new standard to judge the segmentation results. Under this standard, 99% of the total 150 predicted images by U-net are excellent, which is very close to the segmentation result obtained by professional doctors. In this study, we developed an automatic muscle segmentation tool for US-guided muscle injections. The accuracy of the recognition of the muscle boundary was similar to that of manual labeling by a specialist sonographer, providing a reliable auxiliary tool for clinicians to shorten the US learning cycle, reduce the clinical workload, and improve injection safety
    corecore